Speciication Analysis of Aane Term Structure Models 1
نویسنده
چکیده
In this paper, we characterize, interpret, and test the over-identifying restrictions imposed in a ne models of the term-structure. "We begin by showing, using the classi cation scheme proposed by Dai, Liu, and Singleton [10] for general a ne di usions, that the family of N -factor models can be classi ed into N + 1 non-nested sub-families of models. For each subfamily, we derive a canonical model with the property that every admissible member of this family is equivalent to or a nested special case of our canonical model. Second, using our classi cation scheme and canonical models, we show that many of the three-factor models in the literature impose potentially strong over-identifying restrictions, and we completely characterize these restrictions. Finally, we compute simulated-method-of-moments estimates for several members of the sub-family of three-factor models that nest the \benchmark" model of Chen [8], and test the over-identifying restrictions on the joint distribution of longand short-term interest rates implied by these models. We nd that a ne models of r in which the short rate, its long-run mean and its stochastic volatility are conditionally uncorrelated fail to simultaneously describe the short and long ends of the yield curve. Relaxing these correlation restrictions leads to a model that passes several goodness-oft tests over our sample period. These ndings are interpreted in terms of the properties of the risk factors underlying term-structure movements. Introduction RECENTLY CONSIDERABLE attention has been focused on the \a ne" class of term structure models (ATSMs) in which the drifts and volatility coe cients of the state-variable processes are a ne functions of the underlying state vector (e.g., Du e and Kan [12]). ATSMs accommodate potentially rich term-structure dynamics because, in multi-factor models, the conditional variance of each factor can be a positive a ne function of all of the factors, the shocks driving the factors may be correlated, and there may be a ne dependencies among the factors through their drifts. However, both theoretical and empirical studies of a ne models have focused exclusively on seemingly very special cases. For instance, Chen and Scott [9], Pearson and Sun [23], and Du e and Singleton [14] assume that the short rate is an a ne function of a vector of independent, univariate square-root di usions. Alternatively, the ATSMs in Chen [8] and Balduzzi, Das, Foresi and Sundaram [7], in which the short rate itself is a state variable, assume zero correlations among some of the shocks, and impose strong restrictions on the dependencies among the factors through their drifts and conditional volatilities. Therefore, we are lead to inquire: (Q1) Are these special cases indeed very restrictive, or are they the most exible speci cations of ATSMs that yield well-de ned bond prices? (Q2) If they are restrictive, what are the over-identifying restrictions they impose on yield curve dynamics? (Q3) In their least restrictive forms, are ATSMs su ciently exible to describe simultaneously the historical movements in shortand long-term bond yields? This paper answers these questions by providing a complete characterization of the admissible, identi ed multi-factor ATSMs, and then examining the goodness-oft empirically of a newly proposed ATSM that nests several popular ATSMs as special cases. We begin our speci cation analysis by showing, using the classi cation scheme of Dai, Liu, and Singleton [10] for general a ne di usions, that all of the N -factor ATSMs can be conveniently classi ed into N + 1 non-nested sub-families of models. For each of these N +1 sub-families, we derive a canonical model with the property that every other well-de ned ATSM within this sub-family is equivalent to or a nested special case of the canonical model. Furthermore, all of the extant ATSMs cited above are immediately shown to be restricted special cases of our canonical models. Thus, our answer to Q1 is that extant models do indeed impose potentially strong over-identifying restrictions on term structure movements. The reason that there is not an all-encompassing ATSM that nests all 1 extant models as special cases is a consequence of the necessity of constraining the parameter space so that bond prices are well-de ned. This admissibility problem arises, because the volatility of the ith factor, Yi(t), is given by p i + 0 iY (t) and, therefore, i + 0 iY (t) must be positive over the range of Y (t) for bond prices to be well de ned (Du e and Kan [12], Dai, Liu, and Singleton [10]). Whether or not a parameterization is admissible depends jointly on the characteristics of the drift and di usion coe cients of the state vector Y (t) and, as a consequence, there does not exist an all encompassing admissible ATSM that nests all previous models. We proceed by classifying the family of N -factor ATSMs into N + 1 sub-families with the property that, for each sub-family, su cient conditions for admissibility are easily veri ed. The classi cation scheme is based on the number of the factors that determine the volatilities of all N factors. Having classi ed the admissible N -factors ATSMs, we next specialize to the case of N = 3 and describe in detail the nature of the 4 canonical models for the 3-factor family of ATSMs. From this discussion we see that potentially strong over-identifying restrictions were imposed in all ATSMs in the literature. Moreover, this analysis reveal several new insights in the nature of these restrictions. Speci cally, ATSMs allow for more interdependencies among the factors through their drifts, without jeopardizing admissibility or identi cation, than has heretofore been recognized. For instance, we can allow for feedback through the drifts of the stochastic long-run mean and volatility factors in the models of Chen [8] and Balduzzi, Das, Foresi and Sundaram [7]. Similarly, there is no need to constrain the drifts of the square-root di usion in CIR-style models to be independent across factors{ correlated square-root di usions are not inconsistent with admissibility. Furthermore, in the cases of the Chen [8] and Balduzzi, Das, Foresi and Sundaram [7] models, several of the zero restrictions on the correlations among the di usions can be relaxed. These observations lead to new, as yet unexplored, ATSMs. This analysis also shows that CIR-style models, which start from the premise that the short rate r is related to the factors as r(t) = 0 + 0 yY (t), implicitly embody many of the qualitative features of models that have r itself being a state variable. For instance, an immediate implication of our classi cation framework is that CIR-style models are equivalent to models in which one of the factors is the central tendency or stochastic long-run mean of r. At the same time, we argue that, once one relaxes the over-identifying restrictions in extant models, there is considerably more scope for interdependence among the state variables and correspondingly more ambiguity in 2 interpreting them as long-run mean or volatility factors. A key feature of ATSMs from the perspective of econometric tractability is that the prices of a zero coupon bond are exponential a ne functions of the state Y (t). We next show how this feature can be exploiting to compute simulated method of moments (SMM) estimates (Du e and Singleton [13] and Gallant and Tauchen [18]) of ATSMs. Essentially, after selecting a set of moments to be used in estimation, one simulates times series of bond yields implied by the ATSM under investigation and compares the sample moments of the simulated and historical yield series. The parameter estimates are chosen to make these sample moments as close to each other as possible. We apply this methodology to investigate empirically the goodness-oft of one of the 4 branches of 3-factors ATSMs. The particular canonical model we focus on nests the Chen [8] model with stochastic long-run mean and volatility factors as a special case. Using data on shortand long-term swap yields simultaneously, we nd that the constrained Chen model does not adequately describe the joint distribution of short and long term yields. On the other hand, our more exible, canonical model passes several diagnostic tests of t. The key sources of improved t are our relaxation of constraints on the correlations among the three factors through both their drifts and di usions. Finally, we extract the values of the three factors implied by our canonical model and interpret them based on their correlations with several commonly used yield-curve risk factors. The remainder of the paper is organized as follows. Section I de nes the a ne bond pricing model. Section II presents general results pertaining to the classi cation, admissibility, and identi cation of the family of N -factor a ne term structure models. Section III specializes the classi cation results to the family of three-factor a ne term structure models, and shows characterizes explicitly the nature of the over-identifying restrictions in extant models relative to our more exible, canonical models. Section IV explains our estimation strategy and data. Section V presents our empirical results. Finally, Section VI concludes. 3 I The A ne Bond Pricing Model Consider a frictionless economy with riskless borrowing and lending opportunities. Let us x a standard Brownian motion W = (W1;W2; : : : ;WN) in RN restricted to some time interval [0; T ] on a given probability space ( ;F ; P ). We also x the standard ltration F = fFt : t 2 [0; T ]g of W , and let F = FT . Assume that (a) the prices of M bonds follow the Ito process X = (X1; X2; : : : ; XM) in RM , dX(t) = X(t)dt+ X(t)dW (t); (1) where X(t) is an M N matrix; (b) the instantaneous short rate process r(t) is measurable with respect to Ft; and (c) there are no arbitrage opportunities. Then, under technical conditions (see Du e [11] and Hansen and Richard [19]), there exists a state price de ater (t), such that (t)X(t) is a martingale under P ; i.e., for any time t and s > t, X(t) = Et (s) (t)X(s) : (2) The ratio (s) (t) is the stochastic discount factor or pricing kernel for pricing theM securities in the absence of arbitrage. By Ito's lemma, it can be shown that the pricing kernel satis es d (t) (t) = r(t)dt (t)0dW (t); (3) where X(t) (t) = X(t) r(t)X(t). The preceding characterization of the pricing kernel process (t) for pricing bond requires little more than the absence of arbitrage opportunities. The general a ne term structure model is obtained by imposing the additional assumptions that r(t) = 0 + N Xi=1 iYi(t) 0 + 0 yY (t) (4) and (t) =pS(t) ; (5) 4 where, = ( 1; : : : ; N)0, and = ( 1; : : : ; N)0 are N -vectors of constants. The state variables Yi(t), i = 1; 2; : : : ; N , are assumed to follow the N dimensional stochastic process dY (t) = K ( Y (t)) dt+ pS(t)dW (t); (6) where Y (t) = (Y1(t); Y2(t); ; YN(t))0, K and are N N matrices, which may be non-diagonal and asymmetric. S(t) in (5) and (6) is a diagonal matrix with the ith diagonal element given by [S(t)]ii = i + 0 iY (t): (7) This characterization of the a ne term structure model is the continuoustime, a ne counterpart to the formulations of the pricing kernels in Backus and Zin [4] and Backus, Foresi, and Telmer [3]. Our formulation generalizes the continuous time, pricing kernels assumed by Bakshi and Chen [5] and Nielsen and Sa a-Requejo [22], and is equivalent to that of Fisher and Gilles [15]. Thus, the subsequent analysis of the speci cations of a ne term structure models applies to all of these frameworks. Of course, it also applies to equilibrium term structure models that lead to pricing kernels with this a ne structure such as the CIR model. The time t price P (t; ) for a zero-coupon bond with maturity is given by setting X(t+ ) = 1 in (2): P (t; ) = Et (t+ ) (t) ; (8) which, by the Girsanov theorem, is equivalent to P (t; ) = EQ t he R t+ t r(u)dui ; (9) where EQ t [ ] = EQ[ jFt] is the expectation with respect to the \risk-neutral" measure Q conditional on the ltration at time t. The dynamics of the state variables under Q, which is needed in order to evaluate bond prices using (9), is given by dY (t) = ~ K ~ Y (t) dt+ pS(t)d ~ W (t); (10) where ~ W (t) is an N{dimensional independent standard Brownian motion under Q, ~ K = K + , ~ = ~ K 1 (K ), the ith row of is given by i 0 i, and is a N{vector whose ith element is given by i i. 5 The risk-neutral drift (t) and di usion (t) of Y (t) have the feature that both (t) and (t)0 (t) are a ne functions of Y (t). This assures that the zero coupon bond prices are log linear in the state vector Y (t).1 Speci cally, it can be shown [see Du e and Kan [12]] that the zero-coupon bond prices are given by P (t; ) = eA( ) B( )0Y (t); (11) where A( ) and B( ) satisfy the ordinary di erential equations (ODEs) dA( ) d = ~ 0 ~ K0B( ) + 12 N Xi=1 [ 0B( )]2i i 0; (12) dB( ) d = ~ K0B( ) 1 2 N Xi=1 [ 0B( )]2i i + : (13) These ODEs can be solved easily through numerical integration, starting from the initial conditions: A(0) = 0, B(0) = 0N 1. Consequently, estimation of models that simultaneously price longand short-term rates is computationally feasible. Equations (4) (9) characterize what we will refer to as the general AY representation of a multi-factor, a ne bond pricing model.2 1Our speci cation of the state variable dynamics under the real measure is also a ne [see (6)]. This is not necessary for the log linearity of zero coupon bond prices, which only requires that the risk-neutral dynamics of the state variables be a ne. 2There is a di erent formulation of a ne models in the literature that starts with the di usion model for r(t) and adds state variables by allowing the drift and di usion coe cients of r to depend on unobserved state variables (e.g., Balduzzi, Das and Foresi [6] and Chen [8]). See Section III for a proof that this alternative approach produces a model that is analytically equivalent to a member of the class of a ne models examined here. 6 II A Classi cation of Admissible ATSMs Ideally, speci cation analysis could be conducted with the general a ne term structure speci cation (6). However this is not possible, because, for an arbitrary choice of the parameter vector (K; ; ;B; ), the model may not be \admissible" in the sense that a strong solution to (6) may not exist and, hence, zero-coupon bond prices (11) may not be well-de ned. What constitutes an su cient set of restrictions for admissibility depends on the speci cation of the coe cients governing the drift (K and ) and the diffusion ( and B) of Y (t). And the trade-o s in exibility in specifying the drift and di usion, while maintaining admissibility, imply that there is not a \maximally exible" and admissible ATSM. Some admissible ATSMs will be non-nested in other ATSMs. Dai, Liu, and Singleton [10] have proposed a convenient classi cation scheme for a broad class of a ne asset pricing models that facilitates analysis of the admissibility and identi cation problems. For the purpose of our speci cation analysis of ATSMs, we adopt a special case of their scheme with the properties that: (i) for each class, there is a maximal identi ed and admissible model { a canonical model { that nests all other admissible models within this class as a special case; (ii) the subclasses of admissible ATSMs are non-overlapping (the canonical models are non-nested econometrically) and their union is the entire class of admissible ATSMs; (iii) most of the popular ATSMs in the literature are representable as a restricted special case of one of our canonical models. Properties (i) and (iii), in particular, provide a framework for comparing the dynamic structures of di erent a ne models and characterizing their overidentifying restrictions. Throughout this section we focus on the general case of N factors. A more detailed discussion of the implications of our classi cation scheme for interpreting extant term structure models is presented in Section III for the case of N = 3. Let A be the set of ATSMs associated with an N -dimensional state vector Y . Each element of A is indexed by its associated parameter vector . Following Dai, Liu, and Singleton [10] (hereafter DLS), we classify ATSMs 7 based on the rank of the matrix B governing the conditional volatilities of the state vector. For 0 m N , we let A m be the set of ATSMs for which the column rank of B is m (we assume that B has rank 0 if all of its elements are zero). Note that this classi cation scheme leads to N + 1 sub-families of ATSMs that are non-nested and exhaustive of the entire family of ATSMs. Intuitively, our classi cation scheme checks for the number of Y s that determine the volatilities of all N Y s, and classi es a model into A m if this number ism. This classi cation scheme is natural, because the \admissibility problem" relates directly to the structure of pS(t). For the volatility Sii(t) to be meaningful, we require that the i+ 0 iY (t) be strictly positive over the range of Y (t). Our strategy for establishing admissibility (and identi cation) will be to impose su cient conditions on a model A m to assure that the m Y s driving S(t) are always positive. This will assure admissibility of the entire state process Y (t). We show in Section III that this classi cation scheme also fascilitates interpretation of the dynamic structure of many popular 3-factor ATSMs. Adopting this classi cation scheme, we de ne the admissible and identied subfamily of A m in two steps. First, we introduce a canonical model for A m and establish that it is admissible and, given the constraints imposed for admissibility, that is is just-identi ed. Then the admissible, identifed subfamily of A m is de ned to be all members of A m that are equivalent to or special cases of our canonical model. Fixing m > 0, without loss of generality we take the rst m columns of B to be linearly independent, and let Y (B)(t) denote the vector of the associated Y (t)s. The remaining (N m) Y s are denoted by the vector Y (D)(t), so that Y (t)0 = (Y (B)0; Y (D)0). We de ne the canonical model for A m under this classi cation scheme as: De nition II.1 (Canonical Model for A m) The canonical model for A m is described by the following : For m > 0, K = KBB m m 0m (N m) KDB (N m) m KDD (N m) (N m) ; (14) = Bm 1 0(N m) 1 ; (15) = I; (16) 8 = 0m 1 1(N m) 1 ; (17) B = Im m BBD m (N m) 0(N m) m 0(N m) (N m); ; (18) where the elements of BBD m (N m) are non-negative, and the free parameters in K and must satisfy Ki m Xj=1 Kij j > 0; and (19) Kij 0; 1 j m; j 6= i: (20) For m = 0, K is either upper or lower triangular. That our canonical model is admissible and identi ed can be shown formally using the analysis of general a ne asset pricing models in DLS. The remainder of this section outlines proofs of these assertions for our case of ATSMs. The key features of an a ne model that assure Sjj > 0, for all j, are: (i) the shocks to Yi cannot lead Sjj to become negative, and (ii) the drift of Yj is such as to pull it away from zero as Yj approaches zero from above. By de nition of our canonical model, the volatilities of all N state variables are positive a ne functions of rst m state variables. This is an immediate implication of our assumption that B is given by (18) and all elements of B are assumed to be positive. Therefore, it is su cient to verify that Y (B)(t) > 0. By de nition, Y (B)(t) follows the process dY (B) i (t) = Ki( Ŷ (t))dt+qY (B) i (t) dWi(t): (21) The fact that the volatility of Y (B) i (t) depends only on itself means that admissibility of Y (B)(t) requires only that each Y (B) i (t) 0. Furthermore, the assumption that the upper-left m m sub-matrix of is diagonal implies that the di usions are independent. In particular, shocks to Y (B) i (t) cannot contribute to a negative Y (B) j (t) through interactions among their di usions. Furthermore, the assumption that the the upper-right m (N m) block of K is zero (Y (D)(t) does not enter the drift of Y (B)(t)) implies that interactions among the drifts of Y (B)(t) and Y (D)(t) cannot induce a negative 9 element of Y (B)(t). Finally, the constraints (19) and (20) on KBB are the generalization of the Feller condition for a univariate square-root di usion and they imply that Y (B)(t) will remain non-negative. Note that the (su cient) conditions for admissibility of the canonical model for A m are that its parameters be given by B in (18), 0 = (00; D0), = Im m 0m (N m) DB (N m) m DD (N m) (N m); ; (22) where I is a diagonal matrix with i > 0 in the ith diagonal position, and K = KBB m m 0m (N m) KDB (N m) m KDD (N m) (N m) ; (23) and that KBB satisfy conditions (19) and (20). We shall refer to these constraints on and K as the DLS admissibility conditions. The di erences between this admissible parameterization and that of our canonical model are due to normalizations that must be imposed to achieve an identi ed model. The identi cation problem arises, because we can apply non-trivial transformations to the parameters of Y (t) that change the structure of the state process, but leave the distribution of r unchanged. To x the scale of Y (t), we normalized the diagonal elements of to unity. Even with these normalizations, the level of Y (D)(t) is not identi ed if 0 is free. This source of under-identi cation was eliminated by normalizing the longrun mean of Y (D) to zero; that is, by setting the lower (N m) 1 sub-vector of to zero. More interestingly for the speci cation analysis of ATSMs is the fact that the interdependencies between Y (B)(t) and Y (D)(t), and among the Y (D)(t)s, are only partially identi ed. In particular, DLS show that the feedback through the lower (N m) m sub-matrix of , DB, and through the sub-matrix KDB are not separately identi ed.3 In our canonical model, we have chosen to normalize DB to zero so that is diagonal. However, we could equally well have chosen to normalize KDB to zero, in which case DB 3That DB can be normalized to zero follows immediately from the observation that we can transform Y (t) by the a ne transformation TA: (L; 0), where L = Im m 0m (N m) DB (N m) m I(N m) (N m) ; and rede ne y as L 1 y and obtain a model with identical r and bond prices. 10 would be a free matrix. Similarly, we can assume that either DD or KDD be diagonal, and we have normalized the former to be diagonal.4 We will see in Section III, that it is sometimes convenient for interpreting a ne models to switch back and forth between these equivalent models. These are the minimal normalizations that must be imposed to achieve a just-identi ed version of our canonical model. We shall use AYMm(N) to denote this just-identi ed (maximal) canonical model in branch A m (the mnemonic stands for \the canonical representation of Y that is Maximal (just-identi ed) for sub-family A m of all admissible N -factor models"). Since AYMm(N) is the representative model of its own branch, the mnemonic will also be used to refer to the branch itself. It is easy to see that the total number of free parameters in AYMm(N) is given by N2 + N + 1 + m for m > 0, and (N + 1)(N + 2)=2 for m = 0. One should expect that any model in A m that is equivalent to our canonical model, or is a nested special case, will itself be admissible. To formalize this idea, we need to be precise about the class of allowable transformations. We de ne: De nition II.2 (Invariant Transformation) An invariant transformation T of an N-factor ATSM is an arbitrary combination of an a ne transformation TA, a di usion rescaling TD, and a Brownian motion rotation TO, such that, if = ( 0; y;K; ; ; f i; i : 1 i Ng; ) is the parameter vector of an ATSM , then the parameter vector T of the transformed model is given by If T = TA, then T = ( 0 yL 1#; L0 1 y; LKL 1; # + L ; L ; f i 0 iL 1 ; L0 1 i : 1 i Ng; ); (24) where L is an N N non-singular matrix, and # is an N 1 vector. If T = TD, then T = ( 0; y;K; ; ; fD2 ii i; D2 ii i : 1 i Ng; D ); (25) where D is an N N non-singular diagonal matrix. 4Again, this under-identi cation problem is easily seen by noting that we can apply an a ne transformation TA that diagonalizes DD and rede ne y so that r is unchanged. 11 If T = TO, then T = ( 0; y;K; ; OT; f i; i : 1 i Ng; O ); (26) where O is an N N orthogonal matrix (i.e., O 1 = OT) that commutes with S(t). Then any model in A m related to our canonical model for A m by an invariant transformation is said to be \equivalent" to the canonical model. And any model in A m that is a nested special case of our canonical model, possibly after an invariant transformation, is said to be admissible. Summarizing, we have shown that the family of N -factor ATSMs can be classi ed into N + 1 non-nested sub-families of admissible models. The need to consider non-nested sub-families is dictated by the requirements of the known su cient conditions for admissibility of a ne models. For each family, we have characterized a subset of the models that are admissible and a canonical model that nests all admissible models (as de ned by the DLS admissibility conditions). The N + 1 canonical models are, by construction, non-nested. Therefore, an exhaustive characterization of the family of N factor ATSMs requires the analysis of N + 1 non-nested speci cations. We provide such a characterization for the case of N = 3 in Section III. Since the conditions for admissibility are su cient, but are not known to be necessary, we cannot rule out the possibility that there are admissible, econometrically identi ed ATSMs that nest our canonical models as special cases. However, the DLS admissibility conditions are quite general. In particular, as we demonstrate in Section III, the proposed canonical models satisfy our initial criterion of nesting most of the ATSMs that have heretofore been examined in the literature. 12 III Three-Factor ATSMs In this section we explore in considerably more depth the implications of our classi cation scheme for the speci cation of ATSMs. Particular attention is given to interpreting the term structure dynamics associated with our canonical models, and the nature of the over-identifying restrictions that must be imposed on the canonical model to arrive at several models in the literature. To better link up with the empirical term structure literature, we x N = 3 and examine the 4 associated, non-nested sub-families of ATSMs. III.A AYM0(3) If m = 0, then none of the Y (t)s a ect the volatility of Y (t). In other words, the state variables are homoskedastic and Y (t) follows an N -dimensional Gaussian di usion. This model is often referred to as the N -factor Vasicek [24] model. The elements of for the canonical model AYM0(3) are given by K = 24 11 0 0 12 22 0 31 32 33 35; = 24 1 0 0 0 1 0 0 0 1 35; = 24 000 35; = 24 111 35; B = 24 0 0 0 0 0 0 0 0 0 35; where 11 > 0, 22 > 0, and 33 > 0. We arrived at this speci cation through the following steps. First, though m = 0 and our de nition of the canonical models for m 1 might suggest that K could be free, a full feedback matrix in the drift is not identi ed. K is normalized to be lower triangular to eliminate the problem of orthogonal rotations (transformation TO de ned in Section II). An equivalent representation of AYM0(3) has K being diagonal and being lower triangular. Finally, notice that = 0, because there are no elements of Y (B). III.B AYM1(3) The family AYM1(3) is characterized by the assumption that one of the Y s determines the conditional volatility of all three state variables. The 13 canonical model is given by K = 2664 11 j 0 0 12 j 22 23 31 j 32 33 3775; = 2664 1 j 0 0 0 j 1 0 0 j 0 1 3775; = 2664 1 00 3775; = 2664 011 3775; B = 2664 1 j [ 2]1 [ 3]1 0 j 0 0 0 j 0 0 3775; where 11 > 0, 22 > 0, 33 > 0, 1 > 0, [ 2]1 0, and [ 3]1 0. A special case of this sub-family is the model studied by Balduzzi, Das, Foresi and Sundaram [7], hereafter the BDFS model: dv(t) = ( v v(t))dt+ pv(t)dBv(t); (27) d (t) = ( (t))dt + dB (t); (28) dr(t) = ( (t) r(t))dt+pv(t)dB̂r(t); (29) (30) with the only non-zero di usion correlation being cov(dBv(t); dB̂r(t)) = rvdt. Rewriting (29) as dr(t) = ( (t) r(t))dt+pv(t)dBr(t) + rvpv(t)dBv(t); (31) where rv = rv= , and Br(t) and Bv(t) are independent, gives the BDFS model in the standard notation for ATSMs. The rst state variable v(t) is a volatility factor, because it a ects the short rate process only through the conditional volatility of r. The second state variable (t) is the \central tendency" or stochastic long-run mean of r. The short rate mean reverts to its long-run mean (t) at rate . The nature of the over-identifying restrictions implied by the BDFSmodel is most easily seen by re-writing it in an equivalent AY form. Applying an invariant transformation TA,5 coupled with a di usion rescaling that makes the diagonal elements of to be 1, we obtain the following equivalent model r(t) = 0 + 1 Y1(t) + Y2(t) + Y3(t); (34) 5TA is given by 0@ Y1(t) Y2(t) Y3(t) 1A = L 1 0 0@ v(t) (t) r(t) 1A #0; (32) 14 d2664 Y1(t) Y2(t) Y3(t) 3775 = 2664 11 j 0 0 0 j 22 0 0 j 0 33 377526642664 1 00 3775 2664 Y1(t) Y2(t) Y3(t) 37753775 dt(35) + 2664 1 j 0 0 21 j 1 23 31 j 32 1 3775vuuuuut2664 S11(t) j 0 0 0 j S22(t) 0 0 j 0 S33(t) 3775dB(t);(36) whereS11(t) = Y1(t); (37) S22(t) = 2 + [ 2]1 Y1(t); (38) S33(t) = 3 + [ 3]1Y1(t): (39) The over-identifying restrictions implied by this model, indicated in square where L0 = 0@ 1 0 0 0 0 0 1 1 1A; #0 = 0@ 0 1A; (33) 15 \boxes," are given by6 0 = =( ); 1 = 0; 21 = 23 = 0; 32 = 1; (41) [ 2]1 = 3 = 0: These constraints imply that the instantaneous short rate is an a ne function of only two of the three state variables ( 1 = 0). This is an implication of the assumption that the volatility factor v(t) enters r only through its volatility and, therefore, it a ects r only indirectly through its a ects on the distribution of (Y2(t); Y3(t)). This is a feature of many of the extant models in the literature, including the AYM2(3) models discussed subsequently and the model of Andersen and Lund [2]. Additionally, Y2(t) ( (t)) is uncorrelated with Y1(t) (v(t)) and Y3(t) through the di usion term. Finally, note that a key reason that the BDFS model is in AYM1(3) is that the volatility of the long-run mean is constant ( (t) is Gaussian). III.C AYM2(3) The family AYM2(3) is characterized by the assumption that the volatilities of Y (t) are determined by a ne functions of two of the three Y s. The 6The free parameters of the canonical model are linked to the free parameters of the BDFS model by the relations 11 = ; 22 = ; 33 = ; 1 = v; 31 = rv; 2 = 2 2=( )2; [ 3]1 = 2: (40) With the parametric restrictions freed, (35) is equivalent to AYM2(3). The latter form of the canonical model is obtained by the following invariant transformations. A rescaling would take 2 and 3 to 1, but free up 2 and 3. An a ne transformation (L1; 0), where L1 = 2664 1 j 0 0 21 j 1 0 31 j 0 1 3775 would make block-diagonal, while freeing up the lower left block of K. Another a ne transformation (L2; 0), where L 1 2 = 2664 1 j 0 0 0 j 1 23 0 j 32 1 3775 would make diagonal, while freeing up the entire lower right block of K. 16 canonical model can be represented as K = 2664 11 12 j 0 21 22 j 0 31 32 j 33 3775; = 2664 1 0 j 0 0 1 j 0 0 0 j 1 3775; = 2664 1 2 0 3775; = 2664 001 3775; B = 2664 1 0 j [ 3]1 0 1 j [ 3]2 0 0 j 0 3775; where 11 > 0, 22 > 0, 33 > 0, 12 0, 21 0, 1 > 0, 2 > 0, [ 3]1 0, and [ 3]2 0. In this model, the state vector is partitioned into two subgroups: the rst subgroup, Y1(t) and Y2(t), have square root di usions, while the second group, Y3(t), has a di usion that is the square root of an a ne function of the rst subgroup of state variables with positive coe cients. The number of free parameters is equal to 32 + 3 + 1 + 2 = 15, and the admissible state space is R2+ R. A special case of this sub-family is the \benchmark" model of Chen [8], hereafter the Chen model, which is given by dr(t) = ( (t) r(t))dt+pv(t)dW1(t); d (t) = ( (t))dt + p (t)dW2(t); (42) dv(t) = ( v v(t))dt+ pv(t)dW3(t); with the Brownian motions assumed to be mutually independent. This 17 model has the equivalent AY representation7 r(t) = 0 + Y1(t) + Y2(t) + 3 Y3(t); (45) dY (t) = K( Y (t))dt+ pS(t)dW (t); (46) d2664 Y1(t) Y2(t) Y3(t) 3775 = 2664 11 j 0 0 0 j 22 23 0 j 32 33 377526642664 0 2 3 3775 2664 Y1(t) Y2(t) Y3(t) 37753775 dt + 2664 1 j 12 13 0 j 1 0 0 j 0 1 3775vuuuuut2664 S11(t) j 0 0 0 j S22(t) 0 0 j 0 S33(t) 3775dB(t); (47) whereS11(t) = 1 + [ 1]2 Y2(t) + Y3(t); S22(t) = [ 2]2Y2(t); (48) S33(t) = [ 3]3Y3(t): The (eight) over-identifying restrictions (in square boxes) imposed by this 7The TA transformation 0@ Y1(t) Y2(t) Y3(t) 1A = L 1 0 0@ r(t) (t) v(t) 1A #0; (43) where L0 = 0@ 1 1 0 0 0 0 0 1 1A; #0 = 0@ 00 1A; (44) coupled with a di usion rescaling that makes the diagonal elements of to be 1, gives the AY form. 18 model are given by8 0 = = ( ) ; 12 = 1; 3 = 23 = 32 = 13 = 1 = [ 1]2 = 0: (50) Note that, in this form, we have partitioned the state vector in such a way that the \square root" processes form the second subgroup. This demonstrates yet another freedom we have in writing down ATSMs: it does not matter how we label the state variables a priori. To go to the canonical form where the di usion processes form the rst part of the partition, all we need to do is to permute the indices 1 and 3, and correspondingly r(t) and v(t). The ATSM speci ed by (45) { (48) without the restrictions (50) is the maximally exible ATSM within the branch AYM2(3) that nests the Chen model. Since the AYM2(3) branch of the 3-factor ATSMs is the focus of our empirical analysis in Section V, it is instructive to transform this canonical model back into a form with r as one of the state variables. Applying an invariant transformation TA gives9 dr(t) = ( r + (t) r(t))dt+ r ( (t))dt + rv ( v v(t)) +q r + (t) + v(t) dW1(t) + r p (t) dW2(t) + rv pv(t) dW3(t); (53) 8The (seven) free parameters in the AY representation are linked to those in original Chen model by 11 = ; 22 = ; 33 = ; 2 = =( ); 3 = v; [ 2]2 = 2 =( ); [ 3]3 = 2: (49) 9TA is given by 0@ r(t) (t) v(t) 1A = L0@ Y1(t) Y2(t) Y3(t) 1A+ #; (51) where,L = 0@ 1 1 3 0 0 0 0 1 1A; # = 0@ 0 00 1A: (52) 19 d (t) = ( (t))dt + v ( v v(t)) + p (t) dW2(t); (54) dv(t) = ( v v(t))dt+ v ( (t)) + pv(t) dW3(t); (55) where, r = 0 + =( ) + 3 v (56) r = 3 32 =( ); rv = 23 + 3( ) r = 1; = [ 1]2 =( ) r = (1 + 12) =( ); rv = 13 + 3 v = 23( )= ; v = 32 =( ) Again, we have indicated the constraints in the Chen model relative to the canonical model in square boxes. Notice that, without the aid of the AY -form, there is no simple way of verifying that (53) is admissible or of determining the most general a ne model that nests the Chen model. Indeed, there are nine extra terms in (53) relative to (42), but there are only eight new degrees of freedom! The canonical AYM2(3) model allows for richer interest rate dynamics than the Chen model along several important dimensions. First, like the BDFS model, the Chen model constrains one of the elements of y to zero. From (56), it follows that the constraint 3 = 0 show up in the unconstrained Chen model (53) as r = 0. In other words, if r is an a ne function of all three state variables, then (t) is no longer naturally interpreted as the stochastic long-run mean of r. Second and as we will see, more importantly, the Chen model unnecessarily constrains the correlations between shocks to the short-rate and (t) and v(t) to zero: r = 0 = rv. The BDFS model, on the other hand, frees up rv. Third, the DLS admissibility conditions permit feedback in the drift between (t) and v(t) { nonzero v and v { so long as these coe cients are not positive.10 Nonzero values of these parameters further cloud the interpretation of (t) and v(t) as central tendency and volatility factors, a point to which we return in Section V. 10This is implied from the requirement that 23 and 32 be non-positive, if > . If < , the state variable Y2(t) must be negative. An a ne transformation switches the sign of Y2(t), and must be multiplied by 1 wherever it appears in (56). It then follows, again, that v and v must be non-positive. 20 Finally, (t) can enter the volatility of r directly, in which case the shortrate volatility is an a ne function of (t) and v(t). All of these restrictions are examined empirically in Section V. III.D AYM3(3) The nal sub-family of models has m = 3 so that all three Y s determine the volatility structure. The canonical model is parameterized as K = 2664 11 j 12 13 21 j 22 23 31 j 32 33 3775; = 2664 1 j 0 0 0 j 1 0 0 j 0 1 3775; = 2664 1 2 3 3775; = 2664 0 00 3775; B = 2664 1 j 0 0 0 j 1 0 0 j 0 1 3775; where ii > 0 for 1 i 3, ij 0 for 1 i 6= j 3, i > 0 for 1 i 3. With both and B equal to identity matrices, the di usion term of this model is identical to that in the N -factor model based on independent squareroot di usions. The model AYM3(3) is not equivalent to this model, often referred to as the CIR model, however. In particular, in contrast to CIRstyle models that assume that K is diagonal, the canonical model allows K to be a full matrix. Thus, the canonical model is a correlated, square-root (CSR) di usion model. In models with diagonal B, the DLS admissibility conditions preclude relaxing the assumption that is diagonal. However, these conditions do not constrain K to be diagonal. Rather, admissibility requires only that these o -diagonal elements be less than or equal to zero, and not that they be zero. The multi-factorCIR-style models with independent state variables studied by Chen and Scott [9], Pearson and Sun [23], and Du e and Singleton [14] are nested special cases of this CSR model. Our analysis implies that all empirical implementations of CIR-style models have imposed potentially strong over-identifying restrictions by forcing K to be diagonal. In this three-factor model, a diagonal K implies six over-identifying restrictions. 21 IV Simulated Method of Moments Estimation of ATSMs Having characterized the admissible and identi ed ATSMs, we turn next the problem of estimation. In contrast to the special cases of Gaussian and independent square-root models, the conditional likelihood function of the state vector Y (t) is not known for general a ne models. Therefore, we pursue the method of simulated moments (SMM) proposed by Du e and Singleton [13] and Gallant and Tauchen [18]. Our estimation strategy can be outlined as follows: (i) select N yields and a set of moments of these yields to be used in estimation, and choose an initial value for the parameter vector ; (ii) simulate a long time series of observations on the state vector Y (t) using the chosen value of ; compute the associated time series of modelimplied zero-coupon bond prices by solving the Ricatti equations (12) and (13) and substituting these weights into (11); then use the simulated zero-coupon prices to compute the N bond yields; (iii) compute sample versions of the selected moments using both the actual historical yields and simulated yields, and some measure of the distance between them; (iv) nally, adjust and then repeat these steps until the historical and simulated moments are made as close to each other as possible. A key issue for the SMM estimation strategy is the selection of moments in Step (i). Gallant and Tauchen [18] have recently shown that the scores of the likelihood function from an auxiliary model that describes the time series properties of bond yields can serve as the moment conditions for the SMM estimator. More precisely, let yt denote a vector of yields on bonds with di erent maturities, x0t = (y0 t; y0 t 1; : : : ; y0 t `), and f(ytjxt 1; ) denote the conditional density of y associated with the auxiliary description of the yield data. The score of the log-likelihood function evaluated at the maximum likelihood (ML) estimator of with sample size T ( T ) satis es 1 T T Xt=1 @ @ log f(ytjxt 1; T ) = 0: (57) 22 Under suitable regularity conditions (see Du e and Singleton [13] and Gallant and Tauchen [18]), as sample size gets large the sample mean in (57) converges to E[@ log f(ytjxt 1; )=@ ]. It follows that, if the asset pricing model is correctly speci ed, then the sample mean of the score evaluated at y's simulated from the asset pricing model (ŷ ), 1 T T X =1 @ @ log f(ŷ jx̂ 1; T ); (58) where T is the simulation size, should also be approximately zero. Thus, by choosing the estimates of the term structure model to make the sample mean in (58) as close to zero as possible, we obtain estimates of the a ne term structure model. The requirements for the SMM estimator to be consistent for , beyond the requirement that the auxiliary model have at least as many unknown parameters as the dimension of , will be met by many descriptive time series models of bond yields. In particular, consistency of the SMM estimator does not require that the auxiliary model describe the true joint distribution of the discretely sampled bond yields. To select an auxiliary model, we used the Semi-Non-Parametric (SNP) framework proposed by Gallant and Tauchen [18]. Under plausible regularity conditions, an SNP auxiliary model can approximate arbitrarily well the joint conditional distribution of discretely sampled bond yields. Gallant and Long [16] show that, for our term structure model and selection strategy for an auxiliary density f(ytjxt 1; ), the SMM estimator is asymptotically e cient.11 That is, we achieve the e ciency of the maximum likelihood estimator for the true conditional distribution of (discretely sampled) bond yields implied by the structural model. It follows that our SMM estimator is more e cient (asymptotically) than the quasi-maximum likelihood estimator proposed recently by Fisher and Gilles [15]. For our illustrations in Section V, y was chosen to be the yields on sixmonth LIBOR and two-year and ten-year xed-for-variable rate swaps over the sample period April 3, 1987 to August 23, 1996. The length of the sample period was determined in part by the unavailability of reliable swap data for 11More precisely if, for a given order of the polynomial terms in the SNP approximation to the density f described subsequently, sample size is increased to in nity, and then the order of the polynomial is increased, the resulting SMM estimator approaches the e ciency of the maximum likelihood estimator. 23 years prior to 1987. The yields are ordered in y according to increasing maturity (i.e., y1 is the six-month LIBOR rate, etc.). Du e and Singleton [14] found, for a somewhat shorter sample period, that a two-factor CIR model did not simultaneously describe all three of these yields. One outcome of the subsequent empirical analysis is an assessment of the adequacy of the Chen model (42) and its a ne extension (45 { 48) as a description of the swap term structure. In selecting an SNP approximation to the conditional density of swap yields, we started with a conditional normal distribution for the three bond yields with a linear conditional mean and ARCH speci cations of the conditional variances. Then we scaled this conditional normal distribution by polynomial functions of the yields in order to accommodate non-normality of the conditional distribution. After examining the properties of several auxiliary models (see discussions later in this section), we selected the auxiliary model with the following conditional density for our empirical analysis: f(ytjxt 1; ) = c(xt 1) 0 + [h(ztjxt 1)]2 n(zt); (59) where n(:) is the density function of the standard normal distribution, 0 is a small positive number, h(zjx) is a Hermite polynomial in z, c(xt 1) is a normalization constant, and xt 1 is the conditioning set. zt is the normalized version of yt, de ned by zt = R 1 x;t 1(yt x;t 1): (60) The shift vector x;t 1 is assumed to be linear with elements that are functions of L = 1 lags of y, x;t 1 = 0@ 1 + 4 y1;t 1 + 7 y2;t 1 + 10 y3;t 1 2 + 5 y1;t 1 + 8 y2;t 1 + 11 y3;t 1 3 + 6 y1;t 1 + 9 y2;t 1 + 12 y3;t 1 1A: (61) The scale transformation Rx;t 1 is taken to be of the ARCH(Lr)-form, with Lr = 2, Rx;t 1 = BBBBBB@ 1 + 7 j 1;t 1j 2 4 + 25 j 1;t 2j 0 3 + 15 j 2;t 1j 5 + 33 j 2;t 2j 0 0 6 + 24 j 3;t 1j + 42 j 3;t 2j CCCCCCA (62) 24 where t = yt x;t 1. Thus, the starting point for our SNP conditional density for y is a rst-order vector autoregression (VAR), with innovations that are conditionally normal and follow an ARCH process of order two: n(yj x; x), where x;t 1 = Rx;t 1R0 x;t 1. More complex conditional densities are accommodated by scaling n(zt) by the square of the Hermite polynomial h(ztjxt 1). In general, h is a polynomial of order Kz in zt, with coe cients that are polynomials of order Kx in xt 1 and the conditioning information xt 1 consists of Lp lags of yt. We set Lp = 1, so that the conditioning information is xt 1 = yt 1. Additionally, we set Kz = 4, with all of the interaction terms suppressed, and Kx = 0. With these choices, our h depends only on z and can be represented as: h(ztjxt 1) = A1 + 4 Xl=1 3 Xi=1 A3(l 1)+1+i zl i;t (63) The normalizing constant c(xt 1) is the inverse of the integral over yt of the product of [h(zt)]2 and n(zt). The state variables are simulated using the Euler approximation of the stochastic di erential equation governing the state dynamics. We use ve subintervals for each week, and take every fth simulated observation to construct a simulated data set of size 50000. The values of the simulated states are adjusted, if necessary, so that the i+ iY (t) are always nonnegative. Furthermore, the requirements of the Existence Condition are explicitly imposed.12 Gallant and Tauchen [18] showed that the simulated SNP scores (i.e., the SNP score function evaluated at the converged parameter values of the SNP parameters and the converged SMM estimators of the structural parameters) are asymptotically normally distributed with zero mean. Thus, individual scores can be tested by forming t{statistics that have a standard normal asymptotic distribution. The minimized value of the GMM criterion function serves as an overall goodness-oft statistic with an asymptotic 2 distribution and degrees of freedom equal to the di erence between the number of SNP parameters and the number of structural parameters.13 12The Existence Condition can be reduced to a set of state-independent constraints on the model parameters. Details on how these constraints are implemented for the models studied here may be requested directly from the authors. 13Our implementation of SMM with an SNP auxiliary model di ers from many previous implementations by our inclusion of the constant 0 in the SNP density function. 25 In selecting our SNP models, we sought to accommodate known features of both the conditional density of y implied by the structural model and of the empirical distribution of swap yields, while adhering to the principal of parsimony. There were several considerations that in uenced our nal choice of SNP model. For ease of notation in the subsequent discussion, we summarize SNP models in terms of the notation sL LrLpKzIzKxIx.14 Consideration 1 (Conditional Means) An implication of the assumption that the state vector Y follows the a ne di usion (6) is that its conditional mean is linear15 E[Y (t)jY (t 1); Y (t 2); : : :] = Co + e KY (t 1); (64) where Co is a vector of constants. This model-implied property of Y , and the empirical observation that most of the serial dependence of swap yields is well described by a rst-order VAR, motivate our choice of L = 1. There are at least two reasons why the linear structure (64) may not, in fact, be a good approximation to the conditional means of the swap yields Though 0 is identi ed if the scale of h(zjx) is xed, Gallant and Long [16] encountered numerical instability in estimating SNP models with 0 treated as a free parameter. Therefore, we chose to x both 0 and the constant term of h(zjx) at non-zero constants. It appears that 0 was set to zero in previous implementations of the SNP model. However, this choice may introduce numerical problems in SMM estimation, because the SNP density is not guaranteed to be positive de nite with 0 6= 0. In our implementations of SMM , we often found that some simulated observations were close to the zeros in the density function. In such cases (even if it is only for one simulated observation), the SNP scores became nearly singular. This, in turn, caused spurious random spikes in the SMM objective function. This problem was eliminated by setting 0 to a positive number that is su ciently small to leave the estimated parameters of the auxiliary model essentially unchanged. All of the empirical results reported in this paper are obtained with 0 = 0:01. 14L is the order of the autoregressive speci cation of the conditional mean; Lr is the order of the ARCH speci cation of the conditional variance; Lp is the number of lags included in the conditioning set for the Hermite polynomials; Kz is the order in z of the polynomial h(zjx), with positive value indicating non-Gaussian behavior; positive values of Iz indicate suppression of cross-terms in z; Kx is the order in x of the polynomial h(zjx), with positive values indicating heterogeneity in the conditional density; nally, positive values of Ix indicate suppression of cross-terms in x. 15See, for example, Fisher and Gilles [15]. 26 y. First, swap yields are related to zero-coupon yields according to the expression (Du e and Singleton [14]) yn t = 1 P (t; n) P2n j=1 P (t; :5j) (65) and, hence, they are not linear functions of Y . Of course swap yields can be approximated by linear functions of Y with state-variable durations as weights. Our empirical ndings suggest that this may be a reasonably good approximation for characterizing the properties of Y . However, it is an approximation that is not imposed a priori. Second, the true conditional means of the swap yields and underlying zero yields may be nonlinear. That is, the a ne term structure model may be mis-speci ed. Evidence for nonlinearity in univariate models was presented in Ait-Sahalia [1]. Andersen and Lund [2], on the other hand, found little evidence for non-linearity in the drift of their multi-factor analysis of a short rate alone. If Kz = 0, then the conditional mean of zt, and hence of yt, is linear in the SNP model, regardless of the order of Kx. Therefore, to accommodate a nonlinear conditional mean for y, we start by allowing Kz > 0. With Kz > 0 andKx = 0, the structure of SNP model is that of an ARCH-in-Mean model, where the ARCH is of order Lr. More complex forms of nonlinearity for the mean can be accommodated by having both Kz and Kx nonzero. However, with Kz > 0, incrementing Kx by one increases the number of free parameters in the SNP model by 3(3Kz+ 1). Thus, parsimony suggests some caution in setting these parameters. Consideration 2 (Conditional Second Moments) The conditional variances of the state variables in the structural model are time varying and conditional correlations of the di usions are nonzero. Though the conditional variances of the state variables in a ne models are linear functions of the current state, the pricing relation (65) gives little guidance on the structure of the conditional variances of the swap yields. To accommodate persistence in the volatilities of the swap rates, (62) includes a ARCH(2) transformation of the swap yields (Lr = 2).16 Additionally, 16Notice that the formulation of ARCH is in terms of the absolute values of the VAR innovations and not their squared values. 27 the assumption that Kz > 0 allows for more general dependence of the conditional variances on the lagged values of the swap yields than in (62), though still of the ARCH type. Again, complete generality is obtained by letting both Kz and Kx exceed zero. Precisely how the relaxation of the assumption of uncorrelated di usions in a ne models will be manifested in the conditional distribution of y is di cult to say. The conditional correlations of the swap yields in the SNP model are in uenced by the parameters of Rx;t 1 and of the polynomial h(z). Therefore, we expect that freeing up the restrictions r = rv = 0 and v = v = 0 will result in improved diagnostics with regard to the scores for the SNP parameters Aj and j associated with h(z) and Rx;t 1. Consideration 3 (Non-normal Innovations) Though the conditional means of discretely sampled a ne di usions are linear, the innovations are non-normal. For instance, it is well known that the innovations implied by the square-root model are non-central chi-square. To capture this non-normality, we choose Kz > 0. Given the importance of Kz > 0 for accommodating the non-normality of the normalized swap yields (zt) and allowing for some non-linearity of the conditional mean of y, we set Kz = 4. To keep the number of free parameters manageable, we also set suppressed all of the cross terms in the polynomials in z. As noted above, setting Kx > 0 with Kz > 0 substantially increases the dimensionality of the parameter space. In fact, using the Schwarz model selection criterion (BIC), the SNP model s1214300 (BIC = 3:93100) was clearly preferred over s1214310 (BIC = 3:778). Therefore, we chose to set Kx = 0. The deterioration in BIC with Kx > 0 was not a consequence of having Kz = 4. A similar deterioration occurred withKz = 2 whenKx was increased from 0 to 1. This lends further support to our view that Kx > 0 was not essential for characterizing the conditional distribution of swap yields for our sample period. Nevertheless, to provide some assurance that our conclusions are insensitive to the assumption Kx = 0, we also t all of the a ne models using the SNP model s1111010 (Kz = 1; Kx = 1). All of the speci cation tests were qualitatively identical to those with the model s1214300. In the light of all of these considerations, we chose to report results for the auxiliary model s1214300. Maximum likelihood estimators for the parameter 28 vector = (Aj : 2 j 13; j : 1 j 12; j : j = 1; 2; : : : ; 7; 15; 24; 25; 33; 42) (66) are given in Table I. Note that A1 is normalized to 1. Several of the Aj parameters governing the non-normality of the \innovation" zt and the j governing conditional heteroskedasticity are signi cantly di erent from zero at conventional signi cance levels. 29 V Speci cation Tests of the AYM2(3) Branch As an illustration of our classi cation results and proposed estimation method, we investigate which features of dynamic structure of AYM2(3) models are important for describing the term structure of swap yields. Toward this end, we rst estimate the benchmark Chen model and then progressively relax its implied constraints relative to the canonical model for AYM2(3). The relative magnitudes of the 2 statistics for each model relative to the next, progressively more exible alternative, though not independent of the order in which we relax the constraints, turn out to be very informative about which of the constraints are most binding on the data. That is, we use the di erence between the minimized values of the SMM criterion functions for these two (\adjacent") nested models to evaluate each set of constraints as they are relaxed.17 The results are summarized in Table II. The benchmark Chen model, Model 1, gives a 2 statistic of 129:027 with 26 degrees of freedom so the model is strongly rejected. Model 2 relaxes the constraint that 0 = = ( ) : With the long run mean of the three state variables \ xed" at 0, , and v, respectively, a free 0 gives the model the freedom to shift the level of the short rate upward or downward. The 2 drops only slightly to 128:520, with 25 degrees of freedom. Thus, the incremental improvement in 2 does not support introducing a new degree of freedom and, overall, Model 2 is also rejected at conventional signi cance levels.18 Model 3 is obtained from Model 2 by relaxing the constraints 12 = 1 and 13 = 0. In terms of the Ar representation of r, this gives the model 17This statistic is asymptotically distributed as 2(q), where q is the number of parametric restrictions that give the null model as a special case of its less constrained alternative (Newey and West [20]and Gallant and Tauchen [17]). 18 0 is equivalent to the \shift" parameter introduced by Pearson and Sun [23] and Du e and Singleton [14] in the CIR models, who found the parameter signi cant. The di erence arises from the fact that the short rate here is allowed to be slightly negative even with 0 = 0, whereas in the CIR models without the shift, the short rate is constrained to be positive. Even though it is widely believed that the interest rate ought not be negative, there appear to be consistent evidences in the literature that models that allow the short rate to be slighly negative with positive probability do a better job in explaining the yield curve dynamics. One interpretation might be that all of these models are misspeci ed in some ways. A second interpretation is that there are missing factors that are needed to explain the very short-term rates, but that are not central to explaining the longer-term rates that we examine." 30 complete freedom in setting the instantaneous correlations between the short rate and the central tendency and volatility factors (t) and v(t). With this change, the 2 statistic drops dramatically to 22:891, with 23 degrees of freedom and a p-value of 46:527%. This result provides a strong evidence that the correlation restrictions in the Chen model are key reasons that the model fails. More in-depth interpretations of this nding are provided subsequently when we examine the properties of the implied values of Y (t). Next we free up the constraint [ y]3 = 0, which allows the third state variable to enter the short rate directly through the relation r(t) = 0 + 0 yY (t). The incremental reduction in 2 is very small, so [ y]3 = 0 is not rejected by the data. By itself, this nding supports an interpretation of the third state variable as a volatility factor in the sense that the di usions of Y2(t) and Y3(t) ( (t) and v(t)) are uncorrelated and 3 = 0. However, Y3(t) is a pure volatility factor only if the only role of Y3(t) is in determining the volatility of r. And this will be true (when 3 = 0) only if there is no feedback through the drift between Y2(t) and Y3(t): 23 = 0 and 32 = 0. A test of these two constraints on K is provided by the di erence between the 2 statistics for Models 4 and 5, which is a 2(2) of 11.7. As we will see later, the signi cant reduction of 2 statistic can be attributed to a statistically signi cant 23, which implies (see 56) that both v and rv are signi cant. From (53), we see that v(t) a ects the conditional mean of the short rate through rv. Thus, v(t) is no longer a pure volatility factor, and (t) is no longer a pure \central tendency" factor for r(t). Finally, the maximally exible, just-identi ed model in AYM2(3) is obtained from Model 5 by relaxing 1 = 0 and [ 1]2 = b21 = 0. The incremental decline in the overall 2 statistic is only 0:507, suggesting that these constraints do not signi cantly hinder the t of Model 5. Note that [ 1]2 creats a \level e ect" for the volatility of the short rate. The test statistic suggests that the level e ect is statistically insigni cant. The point estimates for the parameters of the most constrained, Chen model (Model 1) and the canonical model AYM2(3) (Model 6) are displayed in columns 2 and 6 of Table III, along with their associated standard errors in the adjacent columns. Examination of the estimates for AYM2(3) shows that many of these estimates are insigni cantly di erent from zero at conventional signi cance levels. And this, in turn, suggests that the canonical model may be over-parameterized. Therefore, with our ndings in Table II in mind, we estimated the intermediate case, denoted AY O2(3), obtained by imposing the constraints [ y]3 = 0, 12 = 1, 1 = 0, and [ 1]2 = 0 on A 2(3). The 31 results are reported in the fourth and fth columns of Table III. Most of the estimated parameters are statistically di erent from zero. Furthermore, the model AYO2(3) is not rejected against the alternative AYM2(3). Additional goodness-ofts statistics are provided by the SNP scores. Table IV presents the sample SNP scores computed from three converged structural models: Chen, AY O2(3), and AYM2(3) along with t-statistics for the null hypotheses that each of the scores is zero. About half of these statistics for the Chen model support rejection of the model, while there is little evidence against the models AY O2(3) and AYM2(3). The scores with the largest t-statistics for the Chen model are those associated with the parameters of the conditional variance (the 's), which reinforces the conclusion that the Chen model fails to capture the volatility/correlation structure of swap yields. The parameter estimates for AYO2(3) displayed in Table III con rm that a nonzero correlation between the short rate and third factor v ( rv 6= 0) is an important ingredient in explaining swap yields. (Recall that 13 = 1) r = 0.) The other source of factor correlations is the interdependencies among (t) and v(t) through their drifts. The values of 32 and 23 relative to their standard errors suggest that it is the feedback from v(t) to (t) which contributes to an improved t of AYO2(3) over the Chen model. Further insights into the improved t of the AYO2(3) model come from examination of the times series of implied state variables. These are computed by inverting the model for the values of Y (t) that give the observed swap rates at each date.19 The state variables in two-factor models are typically interpreted as \level" and \slope" factors based either on the factor loadings in principal components analyses (e.g., Litterman and Scheinkman [21]) or on the properties of the implied state variables (e.g., Du e and Singleton [14]). Litterman and Scheinkman [21] found that their third principal component had loadings that are suggestive of a \curvature" factor. To conrm that the Y 's in our best model have similar interpretations, we computed the implied state variables Ŷ from AY O2(3) and compared the Ŷ to various linear combinations of the swap yields (Figures 1{3). Ŷ2 is plotted against Level, de ned as the ten-year swap yield. Ŷ3 is 19More precisely, the state variables from a given model are the particular realizations of the state variables that let the model price the six-month, two-year and ten-year yields exactly, using the SMM parameter estimates. If the model is correctly speci ed, then the implied state variables associated with the model embody the correct assumed factor dynamics, except for sampling errors. 32 plotted against Slope, de ned as the di erence between the tenand twoyear swap rates. The third observed factor we considered was Butterfly, de ned as the residual from the regression of the two-year swap yield on the six-month LIBOR and ten-year swap yields. In order to compare Butterfly with a comparable tted state variable, we examined the residual20 from the regression of Ŷ1 on Ŷ2 and Ŷ3. All time series are standardized by subtracting their means and scaling by their standard deviations. Figure 1 shows that (the orthogonalized) Ŷ1 is highly correlated with Butterfly and, hence, represents a curvature factor.21 From the other two gures we see that Ŷ2 behaves like a Level factor, and Ŷ3 behaves like a Slope factor. Thus, the three state variables are the dynamic counterparts to the risk factors typically identi ed in principal component analyses. The coupon bond yields are nonlinear functions of the state variables. If a linear approximation of these nonlinear functions is adequate, we should expect that the implied state variables are linear combinations of the yield curve risk factors, which, by constructions, are linear functions of coupon bond yields. This is indeed the case. What is somewhat surprising, is that each of the three implied state variables corresponds almost exclusively to one of the three yield curve risk factors, not a linear combination of the risk factors with comparable weights. To understand this, we note that the models we work with have special structures. First, the rst state variable does not have feedback in the conditional mean with the other two state variables. This allows it to pick up one of the time scales. Secondly, the second and third state variables have diagonal covariance sub-matrix (BB), and the admissibility condition on the feedback sub-matrix K(BB) makes it \essentially" diagonal, in that the o -diagonal elements of K(BB) help only to strengthen the mean reversion of the square-root processes, so that the mean reversion behavior of each square-root process is primarily determined by the corresponding diagonal element of K(BB). Thus the diagonal elements of the feedback matrix pick up the three time scales relatively cleanly, as do the three yield curve risk factors. 20Note that it is the same as the residual of the regression of the short rate on Ŷ2 and Ŷ3. 21Within the AY O2(3) model, the correlation between Ŷ1 and Butterfly is 0:991, whereas the correlations between Ŷ1 and Slope and Level are 0:633 and 0:472, respectively. The correlations of Ŷ2 with Butterfly, Slope, and Level are 0:280, 0:065, and 0:969, respectively. Ŷ3 has a correlation of 0:899 with Slope, and correlations of 0:832 and 0:245 with Butterfly and Level, respectively. 33 The remaining question is why the rst state picks up the shortest time scale, the second the longest, and the third the medium. Some practitioners have documented evidences that the term premium is positively correlated with the yield volatilities. This corroborates our nding (or the other way around) that the volatility factor is highly correlated with the slope factor. The usual argument for the volatility being associated with the curvature factor derives from the intuition of one-factor models, in which the volatility generates a convexity e ect, which explains part of the curvature. The convexity e ect is still there, however, it is secondary to the slope (term premium) e ect. The roles of the rst and second state variables are distinguished by their volatility speci cations. The volatility of the second state variable is driven by itself, ensuring that its factor loading is approximately CIR-like, that is, monotonically downward sloping. Thus, it can not be the curvature factor. The volatility of the rst state variable is driven by the third state variable, allowing two time scales (one in its own mean reversion, the other in the mean reversion of the volatility factor) to operate in such a way that it can produce a curvature e ect. Roughly speaking, the mean reversion of the rst state dominates the left wing of the butter y, while the mean reversion of the volatility factor dominates the right wing of the butter y. That's why the mean reversion coe cient of the rst state variable is necessarily larger than that of the volatility factor, or the slope factor. It is then not surprising that the second state variable is picking up the longest time scale, and is highly correlated with the level factor. There is a potentially important di erence between the speci cation tests conducted here and those based on the implied bond yields in previous studies of a ne models. In the case of CIR-style models, square-root di usions were estimated by maximum likelihood, the model was \inverted" to obtain tted state variables as functions of the data and the maximum likelihood parameter estimates, and then the moments of the implied bond yields were compared to the corresponding moments of the actual yields (the data). For example, Pearson and Sun [23] and Du e and Singleton [14] assess the goodness-oft in their models by regressing actual bond yields on the implied bond yields and testing whether or not the intercept is zero and slope coe cient is unity. In the context of an N -factor a ne model in which N of the bond yields are assumed to be priced exactly, the implied state variables will, by con34 struction, exactly price N of the bond yields.22 So the empirical and implied distributions of these N yields must be identical. Moreover, when the number of yields M is larger than N , the information in the data enters the implied distributions of the other M N yields in two ways: indirectly through the ML estimates of the model, and directly through the inversion of the model, observation by observation, to compute the implied state variables from the N yields that are t exactly. In contrast, the empirical and simulated distributions will generally di er for all M yields. This is because the information in the actual data enters only indirectly through the SMM parameter estimates. The values of the simulated moments are otherwise determined only by the structure of the state-variable process and the choice of risk premiums. For the purpose of evaluating the characteristics of the distributions of bond yields implied by an a ne model, it is the simulated distribution that is most relevant. At a practical level, it is a close correspondence between the simulated and actual distributions that is desirable for pricing options on bonds by Monte Carlo. Another potentially informative use of simulated bond yields is an assessment of the e ects of relaxing parameter constraints on the distributions of yields implied by the model. We illustrate this possibility, as well as the fact that simulated and empirical distributions are di erent even with M = N , by examining the mean swap rates. Figure 4 plots the means of the simulated swap rates for the Chen, AY O2(3), and AYM2(3) models against the observed mean swap yield curve. Consistent with the overall goodness-oft statistics, the di erences between sample and simulated mean swap rates are smallest for the AY O2(3) model and largest for the Chen model. Notice also that AYM2(3) does worse that AYO2(3). This is consistent with our interpretation that AYM2(3) is over-parameterized. While the average twoand ten-year yields are t closely, AYO2(3) appears not to capture the curvature of the curve on average. In fact, the convexity e ect seems to go the wrong way, if we extrapolate the curve beyond ten years. Other models over-state the average slope of the swap yield curve. 22This was true of the models in Chen and Scott [9], Pearson and Sun [23], and Du e and Singleton [14], and is also true of our model as the number of state variables equals the number of bond yields. 35 VI Conclusion In this paper we presented a complete characterization of the admissible and identi ed a ne term structure models, according to the most general known su cient conditions for admissibility. For N -factor models, there are N + 1 non-nested classes of admissible models. For each class, we characterized the \maximally exible" canonical model and the nature of the admissible factor correlations and conditional volatilities that these canonical models can accommodate. We then applied this classi cation scheme to the family of three-factor a ne term structure models in order to characterize the overidentifying restrictions implicit in several of the more popular a ne term structure models in the literature. Finally, a thorough empirical investigation of one of the four branches of the three-factor family of a ne models was carried out to evaluate the goodness-oft of models with the long-run mean and volatility of the short rate following independent a ne di usions. We found that correlation restrictions implicit these models were strongly rejected by the data. One reason this may not have been apparent from previous studies, is that empirical studies of a ne models of the short rate have typically used data on the short rate alone to estimate multi-factor models. In contrast, we t our models using data on bonds with three di erent maturities. Our ndings also suggest that the drift of the instantaneous short rate is more complicated than simply the short rate mean reverting to a stochastic long-run mean. Even though the maximal model within the branch we investigated is not rejected at conventional signi cance levels, it is interesting to know how models from other branches stack up against the data. The comparative properties of a ne models along di erent branches will be explored in future research. 36 References [1] Y. Ait-Sahalia. Testing Continuous-Time Models of the Spot Interest Rate. Review of Financial Studies, 9(2):385 { 426, 1996. [2] T. G. Andersen and J. Lund. Stochastic Volatility and Mean Drift In the Short Term Interest Rate Di usion: Sources of Steepness, Level and Curvature in the Yield Curve. Working paper, February 1996. [3] D. Backus, S. Foresi, and C. Telmer. A ne Models of Currency Prices. Working paper, New York University, 1996. [4] D. K. Backus and S. E. Zin. Reverse Engineering the Yield Curve. NBER Working Paper #4676, 1994. [5] G. S. Bakshi and Z. Chen. Asset Pricing without Consumption or Market Portfolio Data. Working paper, 1997. [6] P. Balduzzi, S. R. Das, and S. Foresi. The Central Tendency: A Second Factor in Bond Yields. Working paper, 1995. [7] P. Balduzzi, S. R. Das, S. Foresi, and R. Sundaram. A Simple Approach to Three Factor A ne Term Structure Models. Journal of Fixed Income, 6:43{53, December 1996. [8] L. Chen. Stochastic Mean and Stochastic Volatility { A Three-Factor Model of the Term Structure of Interest Rates and Its Application to the Pricing of Interest Rate Derivatives. Blackwell Publishers, 1996. [9] R. Chen and L. Scott. Maximum Likelihood Estimation For a Multifactor Equilibrium Model of the Term Structure of Interest Rates. Journal of Fixed Income, 3:14{31, December 1993. [10] Q. Dai, J. Liu, and K. Singleton. Admissibility and Identi cation of A ne Asset Pricing Models. Working paper, Stanford University, 1997. [11] D. Du e. Dynamic Asset Pricing Theory. Princeton University Press, 2nd edition, 1996. [12] D. Du e and R. Kan. A Yield-Factor Model of Interest Rates. Mathematical Finance, 6(4):379{406, October 1996. 37 [13] D. Du e and K. Singleton. Simulated Moments Estimation of Markov Models of Asset Prices. Econometrica, 61:929{952, 1993. [14] D. Du e and K. Singleton. An Econometric Model of the Term Structure of Interest Rate Swap Yields. Journal of Finance, forthcoming, 1996. [15] M. Fisher and C. Gilles. Estimating Exponential A ne Models of the Term Structure. Working paper, 1996. [16] A. R. Gallant and J. R. Long. Estimation Stochastic Di erential Equatiosn E ciently by Minimum Chi-Square. Biometrika, forthcoming, 1997. [17] A. R. Gallant and G. Tauchen. Speci cation Analysis of Continuous Time Models in Finance. Modeling Stock Market Volatility: Bridging the Gap to Continuous Time, Peter Rossi, ed., 1996. [18] A. R. Gallant and G. Tauchen. Which Moments to Match? Econometric Theory, 12:657{681, 1996. [19] L. P. Hansen and S. F. Richard. The Role of Conditioning Information in Deducing Testable Restrictions Implied by Dynamic Asset Pricing Models. Econometrica, 55(3):587{613, May 1987. [20] W. K. Newey and K. D. West. Hypothesis Testing with E cient Method of Moment Estimation. International Economic Review, 28(3), October 1987. [21] R. Litterman and J. Scheinkman. Common Factors A ecting Bond Returns. Journal of Fixed Income, 1:54{61, 1991. [22] L. T. Nielsen and J. Sa a-Requejo. Exchange Rate and Terms Structure Dynamics and the Pricing of Derivative Securities. INSEAD working paper, 1993. [23] N. D. Pearson and T. Sun. Exploiting the Conditional Density in Estimating the Term Structure: An Application to the Cox, Ingersoll, and Ross model. Journal of Finance, XLIX(4):1279{1304, September 1994. [24] O. Vasicek. An Equilibrium Characterization of the Term Structure. Journal of Financial Economics, 5:177{188, 1977. 38 Table I: Parameter Estimators for SNP Model : s1214300 0 Estimate STD t-ratio p-value A2 0.14072 0.15618 0.90100 36.76% A3 0.04042 0.05218 0.77500 43.83% A4 0.09981 0.07455 1.33900 18.06% A5 -0.29095 0.09579 -3.03700 0.24% A6 0.02513 0.03418 0.73500 46.23% A7 -0.10982 0.04492 -2.44500 1.45% A8 -0.01392 0.03543 -0.39300 69.43% A9 0.00340 0.00979 0.34800 72.78% A10 -0.00818 0.00925 -0.88300 37.72% A11 0.02178 0.01547 1.40800 15.91% A12 0.00843 0.00590 1.42900 15.30% A13 0.01519 0.00422 3.59900 0.03% 1 -0.01931 0.00875 -2.20800 2.72% 2 -0.02615 0.01832 -1.42700 15.36% 3 -0.02708 0.02375 -1.14000 25.43% 4 0.92663 0.01877 49.36800 0.00% 5 -0.00359 0.02945 -0.12200 90.29% 6 0.01104 0.03406 0.32400 74.59% 7 0.09446 0.02626 3.59700 0.03% 8 1.00261 0.04092 24.50400 0.00% 9 -0.01008 0.04758 -0.21200 83.21% 10 -0.02676 0.01217 -2.19900 2.79% 11 -0.00685 0.01769 -0.38700 69.88% 12 0.99160 0.02014 49.24200 0.00% 1 0.03544 0.00373 9.51000 0.00% 2 0.02612 0.00290 8.99200 0.00% 3 0.03368 0.00372 9.05100 0.00% 4 0.06457 0.00725 8.90500 0.00% 5 0.13351 0.01372 9.72800 0.00% 6 0.16425 0.01737 9.45500 0.00% 7 0.16619 0.05915 2.80900 0.50% 15 0.01944 0.02152 0.90300 36.65% 24 -0.01995 0.03042 -0.65600 51.18% 25 0.12629 0.05575 2.26500 2.35% 33 0.02938 0.02403 1.22300 22.13% 42 0.03229 0.03335 0.96800 33.30% 39
منابع مشابه
A ne Transformations in Signal and TheirApplication in the Speci cation and Validationof Real - Time Systems ?
In this paper we present aane transformations as an extension of the Signal language for the speciication and validation of real-time systems. To each Signal program is associated a system of equations which specify synchronization constraints on clock variables. The Signal compiler resolves these equations and veriies if the control of a program is functionally safe. By means of the new transf...
متن کاملOn the Interpretation and Identiication of Dynamic Takagi-sugeno Fuzzy Models
Dynamic Takagi-Sugeno fuzzy models are not always easy to interpret, in particular when they are identiied from experimental data. Ideally, it is desirable that a dynamic Takagi-Sugeno fuzzy model should give accurate global nonlinear prediction, and at the same time that its local models are close approximations to the local linearizations of the nonlinear dynamic system. The latter is importa...
متن کاملNon-euclidean Affine Laminations
The purpose of the present paper is to discuss examples of aane Riemann surface laminations which do not admit a leafwise Euclidean structure. The rst example of such a lamination was constructed by Ghys Gh97]. Our discussion is based on the geometric methods developed by Lyubich, Minsky and the author LM97], KL01], which rely on the observation that any aane surface A gives rise in a natural w...
متن کاملA ne systems in L 2 ( IRd ) : the analysis of the analysis operatorAmos
Discrete aane systems are obtained by applying dilations to a given shift-invariant system. The complicated structure of the aane system is due, rst and foremost, to the fact that it is not invariant under shifts. AAne frames carry the additional diiculty that they are \global" in nature: it is the entire interaction between the various dilation levels that determines whether the system is a fr...
متن کاملStructure and Nonrigid Motion Analysis of Satellite Cloud Images
This paper proposes a new method for recovering nonrigid motion and structure of clouds under aane constraints using time-varying cloud images obtained from meteorological satellites. This problem is challenging not only due to the correspondence problem but also due to the lack of depth cues in the 2D cloud images (scaled orthographic projection). In this paper , aane motion is chosen as a sui...
متن کاملLong - Term Memory Prediction Using Affine
Long-term memory prediction extends motion compensation from the previously decoded frame to several past frames with the result of increased coding ee-ciency. In this paper we demonstrate that combining long-term memory prediction with aane motion compensation leads to even higher coding gains. For that, various aane motion parameter sets are estimated between frames in the long-term memory bu...
متن کامل